Understanding the Impact of Population and Cancer Type on Tumor Mutation Burden Scores: A Comprehensive Whole-Exome Study in Cancer Patients From India

The study depicts significance of TMB as a biomarker, variable across cancers and populations/ethnicities.


INTRODUCTION
Improved treatment outcomes, alongside rapid advancements in immune checkpoint inhibitor (ICI) have revolutionized the clinical outcome in terms of quality of life, progression-free survival, and overall survival across multiple cancer types. 1 As ICI becomes a more viable treatment option for patients with cancer, it is crucial to have a clear understanding of the biomarkers used to predict the response to ICI therapy.There are three major predictive biomarkers associated with ICI namely PD-L1 expression, microsatellite stability (MSI), and tumor mutation burden (TMB). 2,3While PD-L1 is a protein expression biomarker, MSI and TMB are genome-wide signatures derived from tumor DNA profiling.4][5][6] The range of TMB scores, however, greatly varies across different cancer types, sequenced region size, nature of mutations (synonymous or nonsynonymous), choice of genes, variant filtration strategy to eliminate common polymorphisms in the population, and inherent biology of the tumor. 7rigin, along with geographic variation, there are other factors, such as lifestyle (smoking, alcohol, predisposition to other diseases), occupational hazards, food, and ancestry, that contribute to increased risk of cancer incidence. 9though, Asia contributes to 40% of the world's cancer burden, in oncology research, the representation of Asian patients in global clinical studies is <15%. 10Most of the US Food and Drug Administration (FDA) approvals for companion diagnostics are based on the research and findings of NGS panels derived from the non-Hispanic White Caucasian population.The major factor which contributes to this disparity includes the role of population-specific polymorphisms, tumorspecific mutations, and their aggressive behavior which varies across individuals in specific populations.
One of the recent studies led by Nassar et al clearly states that TMB calculated using standard methods developed and approved by the FDA is overestimated in the African population as compared with the East European population on the basis of real-world evidence from ICI outcome.According to the authors, nearly 21% of patients of European ancestry had false high-TMB misclassification; then again, almost 37% of Asian and 44% of African descent patients had misclassification of their TMB score. 10From this study, it is evident that ethnicity and geography play a significant role in the clinical outcomes.Therefore, it is warranted to establish these biomarkers in different populations to stratify patients who are eligible for ICI, thus avoiding treatment-related toxicities, and to reduce the cost of treatment.This study discusses the importance of understanding tumor genomic signature variation, TMB, in Indian versus western counterparts, to stratify patients who would respond to therapy with minimum toxicity for ICI.
In the past 2-3 years, there has been increasing evidence on the potential role of population and ethnicity affecting ICI outcomes in multiple cancers.One such clinical study on 207 patients (non-small-cell lung cancer [NSCLC] 1 head and neck squamous cell carcinomas) to evaluate the response to ICI with/without chemotherapy combination observed that racial or ethnic disparity had a significant impact on the objective response rate (ORR) as well as OR in these patients with cancer.The ORR for Hispanic (H) and Black (B) patients was lower compared with non-Hispanic White (W) patients although not statistically significant (H: 27.0%, B: 32.5%, W: 38.7%; H v W: P 5 .209;B v W: P 5 .398).When considering only patients treated with ICI monotherapy, the ORR for Hispanic patients dropped further to 20.7% while the ORR of Black and non-Hispanic White patients remained about the same (B: 29.3% and W: 35.9%, H v W P 5 .133;B v W P 5 .419).Immune-related adverse events were the lowest in the Hispanic population occurring in only 30% of patients compared with 40% of patients in the Black cohort and 50% of the non-Hispanic White cohorts. 11 this study, we used a whole-exome sequencing (WES)based pipeline for TMB calculation in 1,233 Indian patients with cancer, with an indigenously developed strategy for the prediction of true somatic mutations, and adopted a robust nonguided machine learning approach, to understand the distribution of TMB scores.To the best of our knowledge, this is the largest clinical study of its kind from any one of the South Asian countries along with India.Predicting the TMB score distribution across different cancers has significant relevance not only in choosing the patient who has high chances of responding to ICI but also reduces the burden of treatment cost and toxicity for those patients who may not respond to ICI.We also performed a comparative analysis with publicly available The Cancer Genome Atlas (TCGA) data that constitute primarily Caucasians/East European ancestry to understand the role of genetic diversity and ethnicity.12

Indian Patient Cohort
The Indian cohort constitutes a total of 1,233 patient's tumors (from the advanced stage: stage III/IV) across 30 different cancer types, which were processed during December 2020 to January 2022 (Fig 1).This study was conducted according to the principles of the Declaration of Helsinki and as per the International Council on Harmonisation and Good Clinical Practice guidelines. 13,14All the data analyzed as part of this study are a retrospective analysis of patients with cancer, and written informed consent was obtained from these patients to use this deidentified information for research purposes.This study was approved by an independent ethics committee and review board (JCDC, India).

Bioinformatics Pipeline for Variant Calling
The raw sequencing reads were checked for QC using the FastQC tool and trimmed for adapters and a base quality cutoff of Phred score Q30. 15 High-quality sequencing reads were processed in a comprehensive Illumina DRAGEN Somatic Pipeline (Illumina DRAGEN Bio-IT Platform v3.6) which maps the reads to human reference genome (GRCh37), followed by detection of variants including single-nucleotide variations (SNVs) and small insertions/ deletions (INDELs; Fig 2).

Categorization of the Samples on the Basis of the QC Metrics
After the initial analysis, four NGS-QC parameters were selected for scoring the samples, which includes the mean target coverage depth, uniformity of coverage, percent duplicate aligned reads, and base enrichment (Table 1). 16These parameters were given equal weightage, and each sample was scored as 0, 1, and 2, followed by the calculation of the cumulative scores (Appendix Fig A2).These cumulative scores were further used to categorize each sample as a good (score 6-8), intermediate (score 3-5), and poor (score 0-2) quality sample.After careful consideration, samples with ≥3 QC score were considered as pass and samples with <3 QC score were considered as fail.After applying this filtering approach, the cohort size reduced to n 5 973 from N 5 1,233 samples.This final subset was used to understand the trends in TMB scores across different cancers in this cohort.

TMB Calculation Workflow Establishment
The TMB calculation workflow was established by using three unique stages, wherein different aspects of the variants that include quality, nature, type, and clinical significance of the variant were considered to rank a variant as a true somatic variant.After this, the total number of true somatic mutations was divided by the size of the exome panel to obtain the TMB score.

Stage 1-High-Quality Coding Variants Filtration
The detected variants (SNVs/INDELs) were systematically filtered on the basis of variant location and nature of variant type.Only high confidence variants with a minimum quality of 10 (quality score from Illumina DRAGEN Bio-IT Platform v3.6) and a minimum depth of 303 at variant location were considered for the analysis (Fig 2).Synonymous variants were removed, and coding variants were considered for the downstream analysis.

Stage 2-Germline Filtration
The germline variants were removed using a sequential three-level filtering approach adapted from Parikh et al, 20 2020.In level 1 (tolerant approach), the global population frequency and South Asian frequency were used to remove the polymorphic variants (>1% of the population) from the  cohort.In level 2 (stringent approach), the value of variant allele frequency was used to remove the germline heterozygous and homozygous variants.Here, variants with allele frequencies ≤ 0.05 and 0.5 6 0.05 were removed to eliminate the germline variants.In level 3 (baseline approach), variants were filtered on the basis of the in-house baseline (germline samples).The baseline (reference genome pattern) was created by pooling of germline variants derived from WES data from healthy individuals (4baseCare unpublished data; .On the basis of this validation analysis, we were able to establish confidence and robustness in our TMB workflow that could be implemented in the remaining clinical samples of the cohort. The count for true somatic variants was derived using the group B algorithm (tumor-only workflow), and it was divided by the target size to estimate the TMB score for a given tumor sample.

Trend Analysis of TMB Scores Using Bootstrap Resampling Approach From Indian Cohort
In the context of cancer genome landscape, somatic mutations are the primary variables that contribute to interpatient variability and hence the TMB scores. 21We adopted a statistical model using the percentile distributions of TMB scores and a bootstrap resampling approach using the base package in R (The R Foundation for Statistical Computing, Vienna, Austria). 22In this unsupervised approach, 1,000 iterations of a phantom data set (randomly resampled cohorts) were generated from the primary cohort (n 5 973; QC score ≥3).This phantom data set was used to calculate the average TMB score at ninth decile (same as 90th percentile) which was observed to be 21.71 mutation/megabase (mut/Mb; Fig 3B).

Distribution of TMB Across Cancer Types in Indian Cohort
In this study, we have noticed a broad distribution for TMB in the range of 0-161.25

Comparison of TMB Score Distribution Between Indian and TCGA Cohort Across Six Cancer Types
The TMB range varies in the Indian cohort (0-161.25 mut/Mb range) as compared with the TCGA cohort (0-424.8mut/Mb).We overlapped the percentile distribution along with the median of TMB scores (calculated using Kruskal-Wallis test) in these two cohorts (n ≥ 30 for each cancer type).The most critical finding from this study demonstrates a significantly different TMB score distribution between the two population cohorts; particularly, it was evident in four different cancer types: sarcoma, ovarian, head and neck, and breast.Nevertheless, in lung and colorectal cancer, surprisingly, we observed a similar score and trend distribution.(Detail of the analysis have been summarized in the Data Supplement.)Our observation might provide some clue to explain the differences in underlying biology and hence the spectrum of mutations and their evolution between the two populations (Fig 5).

DISCUSSION
4][25] TMB is one such predictive immunotherapy biomarker that has gained importance in the past 5 years after the FDA approval of pembrolizumab. 26However, the pan-cancer TMB score threshold as a predictive biomarker has remained a limitation in patient stratification on the basis of the outcome data from recent clinical studies. 27A review from Japanese Society of Medical Oncology/Japan Society of Clinical Oncology/Japanese Society of Pediatric Hematology/Oncology suggested that optimal TMB cutoff differ according to the cancer type. 28Hence, it is important to Interpopulation comparison of TMB before and after the approach had a significant difference in the mutation signature of the patients, which indicates the need of unbiased bioinformatics approach for each population and ancestry. 29In our study, we have taken steps to remove population-specific polymorphism and germline variants by deploying the three-step filtration strategy.Therefore, assuming a universal criterion and a cutoff for all ethnicities and geographical area may have adverse effects on the patient with cancer.
In this study, we have profiled Indian patients with cancer using WES to estimate the TMB scores and observed variation in the distribution of TMB scores in different cancer types.As an example, head and neck and renal cancer fall into the category of a narrow range of TMB scores (≤10 mut/Mb), whereas breast, lung, colon, pancreatic, and others fall into the category of a broad range of TMB scores (0-161.26mut/Mb).
Comparative analysis of TMB scores between Indian and TCGA cohorts depicted higher TMB scores in few cancers of Indian patients as compared with TCGA counterparts (Fig 5).The Kruskal-Wallis test was used to identify the difference in the median for six different cancer types between the two cohorts.It was observed that there was a significant difference in median, TMB score distribution, and the proportion of patients, who may respond to ICI across four cancer types (sarcoma, ovary, head and neck, and breast).In contrast, similar scores were observed across the 75-95th percentile in lung cancer indicating that the efficacy of ICI therapy in patients with lung cancer may not vary significantly between the Indian and TCGA cohorts.As we carefully examined TMB score distribution at the 95th percentile, we observed a trend of higher cut-off for the selection of patients in the Indian cohort as compared to TGCA for the following cancer types: sarcoma, ovary, and breast.However, we observed similar trends in lung and colorectal, while in head and neck we observed lower score cut-off in Indian data as compared to TCGA (Fig 5).In other words, the thresholds that determine the responders to ICI on the basis of the TMB value may be different between the Indian and the TCGA cohort.In summary, our findings reiterate the need of establishing population-specific and cancer-specific TMB thresholds for the stratification of patients with cancer for ICI.In addition, this study indicates that TMB scores can be calculated accurately on the basis of the tumor-only NGS data to reduce the cost burden for these patients with cancer.It may also provide leads to act on unique pathways that drive cancers on the basis of their genomic signatures and mutational patterns.
As per our knowledge, this publication is the first report from India to understand the TMB score distribution across multiple cancers in a large cohort of Indian patients with cancer (N 5 1,233

Introduction
Whole-exome sequencing (WES) is considered as the gold standard for evaluating TMB as it provides comprehensive coverage of the coding region of the human genome.It is already established by multiple studies that the calculation of TMB score is not only influenced by the size of the sequencing panel and the choice of genes but also by the bioinformatics workflow used in different laboratories.Variations in TMB calculations across laboratories can also be due to differences in genomic diversity in the population, as well as variability in cancer risk predisposition and its etiology.
In the proposed approach, the impact of population-specific diversity in polymorphisms and rare variants is incorporated into the variant filtration strategy.This is believed to provide a more accurate TMB score and better stratification of patients with cancer for the selection of immune checkpoint inhibitors, reducing toxicity and lowering the health care cost burden for the patient.

Study Design
This research was a retrospective observational study of patients with cancer WES data collected between 2020 and 2022.TMB calculation is based on archived tumor specimens: FFPE blocks.According to recommendations from REMARK, the adequacy of tumor content was maintained uniformly by enforcing a strict cutoff of >10% tumor load, with a minimum of 150 viable tumor cells per high-power field (HPF) and a minimum of two HPFs in the hematoxylin and eosin examination of the specimen.Only specimens that met these criteria were considered for further processing.
FIG 2. End-to-end workflow from tumor-only samples to estimate the TMB in a clinical setting.This workflow includes library preparation and sequencing, alignment to the human reference genome (GRCh37), quality-based categorization of samples, variant calling for SNV and INDELS, and variant filtration followed by TMB estimation.FFPE,

Fig 2). Stage 3 -
Statistical Approach for Tumor-Only SamplesVariant calling from tumor-only samples may include both rare germline and somatic variants resulting in overestimated TMB scores.To remove the bias and germline variants and further validate the performance of the TMB filtration strategy, we used a training set of 20 matched tumor-normal samples (group A) and 20 tumor-only samples (group B).In group A, germline variants were removed by subtracting the variants from FFPE using matched blood samples to get true somatic variants.However, in group B, germline variants were removed by a three-level filtering strategy (Fig2).In both groups, all the other variables (confidence value, quality parameters, and depth) were kept constant.The TMB scores of group A (tumornormal pair) and group B (tumor only) depicted a similar trend.Independent test data of additional 40 samples (group C: 20 tumor-normal and group D: 20 tumor-only) were used to accurately predict the TMB score from tumor-only samples.The Pearson correlation coefficient between tumor-only and tumor-normal samples was determined to be 0.94, which depicts a high correlation between the two groups (Fig 3A)

FIG 3 .
FIG 3. (A) Correlation of TMB scores between group A: tumor normal (n 5 20 samples) and group B: tumor only (n 5 20 samples) depicts correlation coefficient, r 2 5 0.94.(B) Statistical approach using percentile distributions of TMB scores and bootstrap resampling (machine learning unsupervised approach) depicts clustering of TMB score at the ninth decile with an average TMB score of 21.71 mutation/Mb.TMB, tumor mutation burden.
FIG A1.Schematic representation of TMB: The total number of somatic or acquired mutations per coding area of a tumor genome (mut/Mb) is known as TMB.TMB-High indicates an increased number of somatic mutations and therefore predicts a good response to immune checkpoint inhibitors.TMB, tumor mutation burden.

TABLE 1 .
Quality Control Parameters Were Being Tested at Preanalytical, Analytical, and Postanalytical Phases to Ensure the Samples Meet Analysis Criteria NOTE.This table shows the quantity and quality of DNA before library preparation and bioinformatics QC (depth of sequencing coverage).Abbreviations: HPF, high power field; Pct, percentage; QC, quality control.
TMB varies among cancer types from Indian cohort: Distribution of TMB scores of n 5 973 (QC score ≥3, QC passed samples) patients across 28 (excluding mixed and rare cancer types with less representation) different tumors from Indian cohort using box plot.The bottom of the box represents the 25th percentile, and the top of the box represents the 75th percentile.GE, gastroesophogeal; QC, quality control; TMB, tumor mutation burden.Distribution of TMB score percentile: Image showing the distribution of TMB scores across six different cancer type (n ≥ 30 for each cancer type) in Indian and TCGA cohort.TCGA, The Cancer Genome Atlas; TMB, tumor mutation burden.(continued on following page) ). Considering the cancer burden and heterogeneity in India, our cohort of patients is still a biased population because of a random collection of samples from different parts of India for various cancer types.Several technical factors such as tissue processing, representative tumor material, choice of NGS panel, bioinformatics pipeline, stage of the disease, and variant filtration strategy could affect the TMB score calculation.Future studies on a much larger cohort of patients with cancer with adequate representation for all the rare cancers, such as sarcomas, gliomas, and others, may throw some light on TMB and its role in predicting ICI response in these cancers.